35 research outputs found
Identifying Similar Meaning In Word Sequences
A system for determining semantic similarity of phrases (word sequences) uses weak supervision of neural networks to generate an embedding space for determining phrase similarity. The training of the neural networks uses input from two orthogonal sources: click distribution of queries and phrase translations
Hybrid robust deep and shallow semantic processing for creativity support in document production
The research performed in the DeepThought project (http://www.project-deepthought.net) aims at demonstrating the potential of deep linguistic processing if added to existing shallow methods that ensure robustness. Classical information retrieval is extended by high precision concept indexing and relation detection. We use this approach to demonstrate the feasibility of three ambitious applications, one of which is a tool for creativity support in document production and collective brainstorming. This application is described in detail in this paper. Common to all three applications, and the basis for their development is a platform for integrated linguistic processing. This platform is based on a generic software architecture that combines multiple NLP components and on robust minimal recursive semantics (RMRS) as a uniform representation language
Cross-lingual Word Clusters for Direct Transfer of Linguistic Structure
It has been established that incorporating word cluster features derived from large unlabeled corpora can significantly improve prediction of linguistic structure. While previous work has focused primarily on English, we extend these results to other languages along two dimensions. First, we show that these results hold true for a number of languages across families. Second, and more interestingly, we provide an algorithm for inducing cross-lingual clusters and we show that features derived from these clusters significantly improve the accuracy of cross-lingual structure prediction. Specifically, we show that by augmenting direct-transfer systems with cross-lingual cluster features, the relative error of delexicalized dependency parsers, trained on English treebanks and transferred to foreign languages, can be reduced by up to 13%. When applying the same method to direct transfer of named-entity recognizers, we observe relative improvements of up to 26%
Neural Paraphrase Identification of Questions with Noisy Pretraining
We present a solution to the problem of paraphrase identification of
questions. We focus on a recent dataset of question pairs annotated with binary
paraphrase labels and show that a variant of the decomposable attention model
(Parikh et al., 2016) results in accurate performance on this task, while being
far simpler than many competing neural architectures. Furthermore, when the
model is pretrained on a noisy dataset of automatically collected question
paraphrases, it obtains the best reported performance on the dataset